Differentially Private Releasing via Deep Generative Model (Technical Report)
نویسندگان
چکیده
Privacy-preserving releasing of complex data (e.g., image, text, audio) represents a long-standing challenge for the data mining research community. Due to rich semantics of the data and lack of a priori knowledge about the analysis task, excessive sanitization is often necessary to ensure privacy, leading to significant loss of the data utility. In this paper, we present dp-GAN, a general private releasing framework for semantic-rich data. Instead of sanitizing and then releasing the data, the data curator publishes a deep generative model which is trained using the original data in a differentially private manner; with the generative model, the analyst is able to produce an unlimited amount of synthetic data for arbitrary analysis tasks. In contrast of alternative solutions, dp-GAN highlights a set of key features: (i) it provides theoretical privacy guarantee via enforcing the differential privacy principle; (ii) it retains desirable utility in the released model, enabling a variety of otherwise impossible analyses; and (iii) most importantly, it achieves practical training scalability and stability by employing multi-fold optimization strategies. Through extensive empirical evaluation on benchmark datasets and analyses, we validate the efficacy of dp-GAN. (The source code and the data used in the paper is available at: https://github.com/alps-lab/dpgan)
منابع مشابه
Differentially Private Generative Adversarial Network
Generative Adversarial Network (GAN) and its variants have recently attracted intensive research interests due to their elegant theoretical foundation and excellent empirical performance as generative models. These tools provide a promising direction in the studies where data availability is limited. One common issue in GANs is that the density of the learned generative distribution could conce...
متن کاملCoupling Random Orthonormal Projection with Gaussian Generative Model for Non-Interactive Private Data Release
A key challenge facing the design of differentially private systems in the non-interactive setting is to maintain the utility of the released data for general analytics applications. To overcome this challenge, we propose the PCA-Gauss system that leverages the novel combination of dimensionality reduction and generative model for synthesizing differentially private data. We present multiple al...
متن کاملDifferentially Private Learning with Kernels
In this paper, we consider the problem of differentially private learning where access to the training features is through a kernel function only. As mentioned in Chaudhuri et al. (2011), the problem seems to be intractable for general kernel functions in the standard learning model of releasing different private predictor. We study this problem in three simpler but practical settings. We first...
متن کاملDeepWriting: Making Digital Ink Editable via Deep Generative Modeling
Digital ink promises to combine the flexibility and aesthetics of handwriting and the ability to process, search and edit digital text. Character recognition converts handwritten text into a digital representation, albeit at the cost of losing personalized appearance due to the technical difficulties of separating the interwoven components of content and style. In this paper, we propose a novel...
متن کاملWhai: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling
To train an inference network jointly with a deep generative topic model, making it both scalable to big corpora and fast in out-of-sample prediction, we develop Weibull hybrid autoencoding inference (WHAI) for deep latent Dirichlet allocation, which infers posterior samples via a hybrid of stochastic-gradient MCMC and autoencoding variational Bayes. The generative network of WHAI has a hierarc...
متن کامل